NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

X-Blossom: Massive Parallelization of Graph Maximum Matching

https://doi.org/10.14778/3748191.3748199

Fan, Dayi; Lee, Rubao; Zhang, Xiaodong (June 2025, Proceedings of the VLDB Endowment)

The blossom algorithm computes maximum matchings in graphs and has been widely applied across diverse domains, including machine learning, economic analysis, and other essential data analytics applications. As data scales and the demand for real-time processing intensifies, high-performance computing solutions have become indispensable. Over the years, substantial research efforts have been dedicated to improving the sequential blossom algorithm. However, developing an efficient parallel solution remains highly challenging due to the algorithm's intricate execution patterns, sequential recursive dependencies, dynamic data structure modifications, and inefficient path search. By thoroughly analyzing existing solutions, we have identified critical issues and proposed a new parallel framework called X-Blossom. This framework eliminates recursion entirely, enables efficient searches for multiple disjoint paths, and employs a simple path table to trace paths, removing the need for dynamic graphs and trees. These efforts in algorithm development result in significant performance enhancement. Extensive experiments on real-world datasets show that X-Blossom outperforms all existing solutions, achieving up to 992x speedup compared to the fastest sequential baseline, and an average of 431x speedup over the state-of-the-art parallel solution using 8 cores. It also demonstrates excellent scalability, achieving an average speedup of 1.72x when threads double in scalability tests to 64 cores. To the best of our knowledge, X-Blossom is the fastest solution for this class of graph algorithms.
more » « less
Free, publicly-accessible full text available June 1, 2026
A Case Study for Ray Tracing Cores: Performance Insights with Breadth-First Search and Triangle Counting in Graphs

https://doi.org/10.1145/3727108

Xiao, Zhixiong; Xiao, Mengbai; Yuan, Yuan; Yu, Dongxiao; Lee, Rubao; Zhang, Xiaodong (May 2025, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

The emerging Ray-tracing cores on GPUs have been repurposed for non-ray-tracing tasks by researchers recently. In this paper, we explore the benefits and effectiveness of executing graph algorithms on RT cores. We re-design breadth-first search and triangle counting on the new hardware as graph algorithm representatives. Our implementations focus on how to convert the graph operations to bounding volume hierarchy construction and ray generation, which are computational paradigms specific to ray tracing. We evaluate our RT-based methods on a wide range of real-world datasets. The results do not show the advantage of the RT-based methods over CUDA-based methods. We extend the experiments to the set intersection workload on synthesized datasets, and the RT-based method shows superior performance when the skew ratio is high. By carefully comparing the RT-based and CUDA-based binary search, we discover that RT cores are more efficient at searching for elements, but this comes with a constant and non-trivial overhead of the execution pipeline. Furthermore, the overhead of BVH construction is substantially higher than sorting on CUDA cores for large datasets. Our case studies unveil several rules of adapting graph algorithms to ray-tracing cores that might benefit future evolution of the emerging hardware towards general-computing tasks.
more » « less
Free, publicly-accessible full text available May 27, 2026
LibRTS: A Spatial Indexing Library by Ray Tracing

https://doi.org/10.1145/3710848.3710850

Geng, Liang; Lee, Rubao; Zhang, Xiaodong (February 2025, ACM)

The Ray-Tracing (RT) core has become a widely integrated feature in modern GPUs to accelerate ray-tracing rendering. Recent research has shown that RT cores can also be repurposed to accelerate non-rendering workloads. Since the RT core essentially serves as a hardware accelerator for Bounding Volume Hierarchy (BVH) tree traversal, it holds the potential to significantly improve the performance of spatial workloads. However, the specialized RT programming model poses challenges for using RT cores in these scenarios. Inspired by the core functionality of RT cores, we designed and implemented LibRTS, a spatial index library that leverages RT cores to accelerate spatial queries. LibRTS supports both point and range queries and remains mutable to accommodate changing data. Instead of relying on a case-by-case approach, LibRTS provides a general, highperformance spatial indexing framework for spatial data processing. By formulating spatial queries as RT-suitable problems and overcoming load-balancing challenges, LibRTS delivers superior query performance through RT cores without requiring developers to master complex programming on this specialized hardware. Compared to CPU and GPU spatial libraries, LibRTS achieves speedups of up to 85.1x for point queries, 94.0x for range-contains queries, and 11.0x for range-intersects queries. In a real-world application, pointin-polygon testing, LibRTS also surpasses the state-of-the-art RT method by up to 3.8x.
more » « less
Free, publicly-accessible full text available February 28, 2026
Data Management: Interactions with Computer Architecture and Systems

https://doi.org/10.1017/9781009122115

Zhang, Xiaodong; Lee, Rubao (November 2024, Cambridge University Press)

This guide illuminates the intricate relationship between data management, computer architecture, and system software. It traces the evolution of computing to today's data-centric focus and underscores the importance of hardware-software co-design in achieving efficient data processing systems with high throughput and low latency. The thorough coverage includes topics such as logical data formats, memory architecture, GPU programming, and the innovative use of ray tracing in computational tasks. Special emphasis is placed on minimizing data movement within memory hierarchies and optimizing data storage and retrieval. Tailored for professionals and students in computer science, this book combines theoretical foundations with practical applications, making it an indispensable resource for anyone wanting to master the synergies between data management and computing infrastructure.
more » « less
Full Text Available
High-Performance Spatial Data Analytics: Systematic R&D for Scale-Out and Scale-Up Solutions from the Past to Now

Wang, Fusheng; Lee, Rubao; Teng, Dejun; Zhang, Xiaodong; Salz, Joel (August 2024, Proceedings of the VLDB Endowment)

We released open-source software Hadoop-GIS in 2011, and presented and published the work in VLDB 2013. This work initiated the development of a new spatial data analytical ecosystem characterized by its large-scale capacity in both computing and data storage, high scalability, compatibility with low-cost commodity processors in clusters and open-source software. After more than a decade of research and development, this ecosystem has matured and is now serving many applications across various fields. In this paper, we provide the background on why we started this project and give an overview of the original Hadoop-GIS software architecture, along with its unique technical contributions and legacy. We present the evolution of the ecosystem and its current state-of-the- art, which has been influenced by the Hadoop-GIS project. We also describe the ongoing efforts to further enhance this ecosystem with hardware accelerations to meet the increasing demands for low latency and high throughput in various spatial data analysis tasks. Finally, we will summarize the insights gained and lessons learned over more than a decade in pursuing high-performance spatial data analytics.
more » « less
Full Text Available
High-Performance Spatial Data Analytics: Systematic R&D for Scale-Out and Scale-Up Solutions from the Past to Now

Wang, Fusheng; Lee, Rubao; Teng, Dejun; Zhang, Xiaodong; Saltz, Joel (August 2024, Proceedings of the VLDB Endowment)

We released open-source software Hadoop-GIS in 2011, and presented and published the work in VLDB 2013. This work initiated the development of a new spatial data analytical ecosystem characterized by its large-scale capacity in both computing and data storage, high scalability, compatibility with low-cost commodity processors in clusters and open-source software. After more than a decade of research and development, this ecosystem has matured and is now serving many applications across various fields. In this paper, we provide the background on why we started this project and give an overview of the original Hadoop-GIS software architecture, along with its unique technical contributions and legacy. We present the evolution of the ecosystem and its current state-of the-art, which has been influenced by the Hadoop-GIS project. We also describe the ongoing efforts to further enhance this ecosystem with hardware accelerations to meet the increasing demands for low latency and high throughput in various spatial data analysis tasks. Finally, we will summarize the insights gained and lessons learned over more than a decade in pursuing high-performance spatial data analytics.
more » « less
Full Text Available
High-Performance Spatial Data Analytics: Systematic R&D for Scale-Out and Scale-Up Solutions from the Past to Now

https://doi.org/10.14778/3685800.3685912

Wang, Fusheng; Lee, Rubao; Teng, Dejun; Zhang, Xiaodong; Saltz, Joel (August 2024, Proceedings of the VLDB Endowment)

We released open-source software Hadoop-GIS in 2011, and presented and published the work in VLDB 2013. This work initiated the development of a new spatial data analytical ecosystem characterized by its large-scale capacity in both computing and data storage, high scalability, compatibility with low-cost commodity processors in clusters and open-source software. After more than a decade of research and development, this ecosystem has matured and is now serving many applications across various fields. In this paper, we provide the background on why we started this project and give an overview of the original Hadoop-GIS software architecture, along with its unique technical contributions and legacy. We present the evolution of the ecosystem and its current state-of-the-art, which has been influenced by the Hadoop-GIS project. We also describe the ongoing efforts to further enhance this ecosystem with hardware accelerations to meet the increasing demands for low latency and high throughput in various spatial data analysis tasks. Finally, we will summarize the insights gained and lessons learned over more than a decade in pursuing high-performance spatial data analytics.
more » « less
Full Text Available
RayJoin: Fast and Precise Spatial Join

https://doi.org/10.1145/3650200.3656610

Geng, Liang; Lee, Rubao; Zhang, Xiaodong (May 2024, https://doi.org/10.1145/3650200.3656610)

Full Text Available
UltraPrecise: A GPU-Based Framework for Arbitrary-Precision Arithmetic in Database Systems

https://doi.org/10.1109/ICDE60146.2024.00294

Li, Xin; Xiao, Mengbai; Yu, Dongxiao; Lee, Rubao; Zhang, Xiaodong (May 2024, IEEE)

Fixed-point decimal operations in databases with arbitrary-precision arithmetic refer to the ability to store and operate decimal fraction numbers with an arbitrary length of digits. This type of operation has become a requirement for many applications, including scientific databases, financial data processing, geometric data processing, and cryptography. However, the state-of-the-art fixed-point decimal technology either provides high performance for low-precision operations or supports arbitrary-precision arithmetic operations at low performance. In this paper, we present a design and implementation of a framework called UltraPrecise which supports arbitraryprecision arithmetic for databases on GPU, aiming to gain high performance for arbitrary-precision arithmetic operations. We build our framework based on the just-in-time compilation technique and optimize its performance via data representation design, PTX acceleration, and expression scheduling. UltraPrecise achieves comparable performance to other high-performance databases for low-precision arithmetic operations. For highprecision, we show that UltraPrecise consistently outperforms existing databases by two orders of magnitude, including workloads of RSA encryption and trigonometric function approximation.
more » « less
Full Text Available
X-TED: Massive Parallelization of Tree Edit Distance

https://doi.org/10.14778/3654621.3654634

Fan, Dayi; Lee, Rubao; Zhang, Xiaodong (March 2024, Proceedings of the VLDB Endowment)

The tree edit distance (TED) has been found in a wide spectrum of applications in artificial intelligence, bioinformatics, and other areas, which serves as a metric to quantify the dissimilarity between two trees. As applications continue to scale in data size, with a growing demand for fast response time, TED has become even more increasingly data- and computing-intensive. Over the years, researchers have made dedicated efforts to improve sequential TED algorithms by reducing their high complexity. However, achieving efficient parallel TED computation in both algorithm and implementation is challenging due to its dynamic programming nature involving non-trivial issues of data dependency, runtime execution pattern changes, and optimal utilization of limited parallel resources. Having comprehensively investigated the bottlenecks in the existing parallel TED algorithms, we develop a massive parallel computation framework for TED and its implementation on GPU, which is called X-TED. For a given TED computation, X-TED applies a fast preprocessing algorithm to identify dependency relationships among millions of dynamic programming tables. Subsequently, it adopts a dynamic parallel strategy to handle various processing stages, aiming to best utilize GPU cores and the limited device memory in an adaptive and automatic way. Our intensive experimental results demonstrate that X-TED surpasses all existing solutions, achieving up to 42x speedup over the state-of-the-art sequential AP-TED, and outperforming the existing multicore parallel MC-TED by an average speedup of 31x.
more » « less
Full Text Available

« Prev Next »

Search for: All records